A Review of Missing Data Handling Methods
نویسنده
چکیده
Most of the real world datasets suffer from the problem of missing data. It may lead data mining analysts to end with wrong inferences about data under study. Many researchers are working on this problem to introduce more sophisticated methods. Eventhough many methods are available, analysts are facing difficulty in searching a suitable method due to lack of knowledge about the methods and their applicability. To bridge this gap, this paper provides a brief overview of the review papers that have been published during last 10 years that deal with missing values. It discusses about the methods that are compared in the literatures and observations that the authors have made. Finally the techniques that are recommended in most of the literatures are implemented in real world datasets and the empirical results are studied.
منابع مشابه
کاربرد جای گذاری چندگانه در تحقیقات پزشکی و اپیدمیولوژی
Data missing, which occurs for different reasons, is an unavoidable problem in epidemiological studies. It is quite widespread and, therefore, it is considered as a challenge in research design and data analysis by many methodologists. Complete case analysis is often used in studies with missing data however, this approach may result in inaccurate estimates and inferences due to bias associated...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملDEA with Missing Data: An Interval Data Assignment Approach
In the classical data envelopment analysis (DEA) models, inputs and outputs are assumed as known variables, and these models cannot deal with unknown amounts of variables directly. In recent years, there are few researches on handling missing data. This paper suggests a new interval based approach to apply missing data, which is the modified version of Kousmanen (2009) approach. First, the prop...
متن کاملReview of the Methods for Handling Missing Data in Longitudinal Data Analysis
Even in well-controlled situations, missing data always occur in longitudinal data analysis. Missing data may degrade the performance of confidence intervals, reduce statistical power and bias parameter estimate. In this paper, we review and discuss general approaches for handling miss data in longitudinal studies. We first illustrate the mechanism of missing data. Then we focus on the methods ...
متن کاملمقایسه روش الگوریتم EM و روشهای متداول جانهی دادههای گمشده: مطالعهروی پرسشنامه خوددرمانی بیماران دیابتی
Background and Objectives: Missing data is a big challenge in the research. According to the type of the study and of the variables, different ways have been proposed to work with these data. This study compared five popular imputation approaches in addressing missing data in the questionnaires. Methods: In this study, 500 questionnaires were used for self-medication in diabetic patients. Mi...
متن کاملA Review of Missing Data Handling Methods in Education Research
Missing data are a common occurrence in survey-based research studies in education, and the way missing values are handled can significantly affect the results of analyses based on such data. Despite known problems with performance of some missing data handling methods, such as mean imputation, many researchers in education continue to use those methods as a quick fix. This study reviews the cu...
متن کامل